AITopics | batch imitation learning

Collaborating Authors

batch imitation learning

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Strictly Batch Imitation Learning by Energy-based Distribution Matching

Neural Information Processing SystemsDec-24-2025, 01:13:40 GMT

Consider learning a policy purely on the basis of demonstrated behavior---that is, with no access to reinforcement signals, no knowledge of transition dynamics, and no further interaction with the environment. This problem arises wherever live experimentation is costly, such as in healthcare. One solution is simply to retrofit existing algorithms for apprenticeship learning to work in the offline setting. But such an approach leans heavily on off-policy evaluation or offline model estimation, and can be indirect and inefficient. We argue that a good solution should be able to explicitly parameterize a policy (i.e.

batch imitation learning, energy-based distribution matching, name change, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.60)

Add feedback

Review for NeurIPS paper: Strictly Batch Imitation Learning by Energy-based Distribution Matching

Neural Information Processing SystemsJan-24-2025, 10:29:03 GMT

Additional Feedback: - The authors note (with references) that the pure behavioral cloning approach performs poorly as it doesn't use information about the dynamics and state distributions of the problem. It would be useful if the authors could present a short concrete example of exactly what type of information is lost when ignoring the MDP structure. At a first read it feels like it implies the off-line setting means we have all the information we *need* from the start, which I think is the opposite of what the authors are trying to say. - Line 112 - This sentence immediately brings to mind a decision between parametric vs. non-parametric methods. I don't think that's what the authors are trying to say so maybe the terminology of "parameterizing a policy" should be changed throughout the paper. If it is what the authors are trying to say, then it is not made clear why a parametric approach is the correct choice.

batch imitation learning, energy-based distribution matching, neurips paper, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Review for NeurIPS paper: Strictly Batch Imitation Learning by Energy-based Distribution Matching

Neural Information Processing SystemsJan-24-2025, 10:28:56 GMT

All reviewers unanimously agree that the paper makes a nice contribution to imitation learning in the batch setting. That said, the paper has two major weaknesses: 1. During the discussion, the reviewers expressed confidence that the authors understand the mistake and know how to address it (see e.g., the post-rebuttal update of R4). Therefore, we are recommending acceptance conditioned on that the authors take this issue seriously, correct the technical mistake, and remove any incorrect or misleading claims associated with it. The authors are strongly recommended to add such a comparison in the camera-ready version. On a related note, while the algorithm only uses (s,a) pairs as data, trajectory data is often available, from which one can extract (s,a,r,s') pairs.

artificial intelligence, energy-based distribution matching, machine learning, (3 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.98)
Information Technology > Artificial Intelligence > Robots (0.65)

Add feedback

Strictly Batch Imitation Learning by Energy-based Distribution Matching

Neural Information Processing SystemsOct-10-2024, 05:32:15 GMT

Consider learning a policy purely on the basis of demonstrated behavior---that is, with no access to reinforcement signals, no knowledge of transition dynamics, and no further interaction with the environment. This strictly batch imitation learning problem arises wherever live experimentation is costly, such as in healthcare. One solution is simply to retrofit existing algorithms for apprenticeship learning to work in the offline setting. But such an approach leans heavily on off-policy evaluation or offline model estimation, and can be indirect and inefficient. We argue that a good solution should be able to explicitly parameterize a policy (i.e.

algorithm, batch imitation learning, energy-based distribution matching

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.62)

Add feedback